<h1>1. Caption Evaluation</h1>
<p>This page describes the <span class="fontItalic">caption evaluation code</span> used by COCO and provides instructions for submitting results to the evaluation server. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple common metrics, including BLEU, METEOR, ROUGE-L, and CIDEr (the writeup below contains references and descriptions of each metric). If you use the captions, evaluation code, or server, we ask that you cite <a href="http://arxiv.org/abs/1504.00325" target="_blank">Microsoft COCO Captions: Data Collection and Evaluation Server</a>:</p>
<div class="json fontSmall">
  @article{capeval2015,
  <div class="jsontab">
    Author={X. Chen and H. Fang and TY Lin and R. Vedantam and S. Gupta and P. Dollár and C. L. Zitnick}, <br/>
    Journal = {arXiv:1504.00325}, <br/>
    Title = {Microsoft COCO Captions: Data Collection and Evaluation Server}, <br/>
    Year = {2015}
  </div>
  }
</div>
<p>To obtain results on the COCO test set, for which ground truth annotations are hidden, generated results must be submitted to the <span class="fontItalic">evaluation server</span>. The exact same evaluation code, described below, is used to evaluate generated captions on the test set.</p>

<h1>2. Evaluation Code</h1>
<p>Evaluation code can be obtained on the <a href="http://github.com/tylin/coco-caption" target="_blank">coco-captions github</a> page. Unlike the general COCO API, the COCO caption evaluation code is only available under Python. Before running the evaluation code, please prepare your results in the format described on the <a href="#format-results">results format</a> page.</p>
<p>Running the evaluation code produces two data structures that summarize caption quality. The two structs are <span class="fontMono">evalImgs</span> and <span class="fontMono">eval</span>, which summarize caption quality per-image and aggregated across the entire test set, respectively. Details for the two data structures are given below. We recommend running the <a href="https://github.com/tylin/coco-caption/blob/master/cocoEvalCapDemo.ipynb" target="_blank">python caption evaluation demo</a> for more details.</p>

<div class="json">
  <div class="jsonreg">evalImgs[{</div>
  <div class="jsonk">"image_id"     </div><div class="jsonv">: int,        </div>
  <div class="jsonk">"BLEU_1"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_2"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_3"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_4"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"METEOR"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"ROUGE_L"      </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"CIDEr"        </div><div class="jsonv">: float,      </div>
  <div class="jsonreg">}]</div>
</div><br/>

<div class="json">
  <div class="jsonreg">eval{</div>
  <div class="jsonk">"BLEU_1"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_2"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_3"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"BLEU_4"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"METEOR"       </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"ROUGE_L"      </div><div class="jsonv">: float,      </div>
  <div class="jsonk">"CIDEr"        </div><div class="jsonv">: float,      </div>
  <div class="jsonreg">}</div>
</div>

<h1>3. Upload Results</h1>
<p>This rest of this page describes the <i>upload instructions</i> for submitting results to the caption <a href="https://competitions.codalab.org/competitions/3221" target="_blank">evaluation server</a>. Submitting results allows you to participate in the <a href="#captions-2015">COCO Captioning Challenge 2015</a> and compare results to the state-of-the-art on the <a href="#captions-leaderboard">captioning leaderboard</a>.</p>
<p><i>Training Data</i>: The recommended training set for the captioning challenge is the COCO 2014 Training Set. The COCO 2014 Validation Set may also be used for training when submitting results on the test set. External data of any form is allowed (except any form of annotation on the COCO Testing set is forbidden). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.</p>
<p>Please limit the number of entries to the captioning challenge to a reasonable number, e.g. one entry per paper. To avoid overfitting to the test data, the <i>number of submissions per user is limited to 1 upload per day and a maximum of 5 submissions per user</i>. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation.</p>
<p>First you need to create an account on <a href="https://codalab.org" target="_blank">CodaLab</a>. From your account you will be able to participate in all COCO challenges.</p>
<p>Before uploading your results to the evaluation server, you will need to create two JSON files containing your captioning results in the correct <a href="#format-results">results format</a>. One file should correspond to your results on the 2014 validation dataset, and the other to the 2014 test dataset. Both sets of results are required for submission. Your files should be named as follows:</p>
<div class="json fontMono">
  <div class="jsonreg">
    <b>results.zip</b><br/>
    &nbsp; captions_val2014_[alg]_results.json<br/>
    &nbsp; captions_test2014_[alg]_results.json
  </div>
</div>
<p>Replace [alg] with your algorithm name and place both files into a single zip file named "results.zip".</p>
<p>To submit your zipped result file to the <a href="https://competitions.codalab.org/competitions/3221" target="_blank">COCO Captioning Challenge</a> click on the “Participate” tab on the CodaLab webpage. When you select “Submit / View Results” you will be given the option to submit new results. Please fill in the required fields and click “Submit”. A pop-up will prompt you to select the results zip file for upload. After the file is uploaded the evaluation server will begin processing. To view the status of your submission please select “Refresh Status”. Please be patient, the evaluation may take quite some time to complete. If the status of your submission is “Failed” please check to make sure your files are named correctly, they have the right <a href="#format-results">format</a>, and your zip file contains two files corresponding to the validation and testing datasets.</p>
<p>After you submit your results to the evaluation server, you can control whether your results are publicly posted to the CodaLab leaderboard. To toggle the public visibility of your results please select either “post to leaderboard” or “remove from leaderboard”. For now only one result can be published to the leaderboard at any time, we may change this in the future. After your results are posted to the CodaLab leaderboard, your captions on the validation dataset will be publicly available. Your captions on the test set will not be publicly released.</p>
<p>In addition to the <a href="https://competitions.codalab.org/competitions/3221#results" target="_blank">CodaLab leaderboard</a>, we also host our own more detailed <a href="#captions-leaderboard">leaderboard</a> that includes additional results and method information (such as paper references). Note that the CodaLab leaderboard may contain results not yet migrated to our own leaderboard.</p>
<p>After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Download evaluation output from scoring step.” The zip file will contain five files:</p>
<div class="json fontMono">
  <div class="jsonreg">
    <div class="jsonblk">
      captions_val2014_[alg]_evalimgs.json<br/>
      captions_val2014_[alg]_eval.json<br/>
      captions_test2014_[alg]_eval.json<br/>
      metadata<br/>
      scores.txt
    </div>
    <div class="jsonblk">
      % per image evaluation on val <br/>
      % aggregated evaluation on val <br/>
      % aggregated evaluation on test <br/>
      % auto generated (safe to ignore) <br/>
      % auto generated (safe to ignore)
    </div>
  </div>
</div>
<p>The format of the json eval file is described earlier on this page. Please note that the *_evalImgs.json file is only available for download on the validation dataset, and not the test set.</p>
